Improving Biomedical Document Retrieval by Mining Domain Knowledge
نویسندگان
چکیده
When research articles introduce new findings or concepts they typically relate them only to knowledge and domain concepts of immediate relevance. However, many domain concepts relevant for the article and its findings are omitted in the text. This may prevent us from retrieving articles of interest when executing a search query. Approaches such as probabilistic latent semantic indexing (PLSI) overcome this limitation by projecting terms in articles to a lower dimensional latent space and best possible matches in this space are identified. Nevertheless, this approach may not perform well enough if the number of explicit knowledge concepts in the articles is too small compared to the amount of knowledge in the domain. The objective of this paper is to address the problem by exploiting a domain knowledge layer: a rich network of associations among knowledge concepts in the domain of interest. We present a new document retrieval framework that i) extracts associations among knowledge concepts from many documents in the literature corpus; ii) and exploits them to improve the retrieval of relevant documents. We test our approach on the problem of retrieval of biomedical documents and show that it outperforms standard Lucene and BM25 information-retrieval methods.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملA Web-Mining Approach to Disambiguate Biomedical Acronym Expansions
Named Entities Recognition (NER) has become one of the major issues in Information Retrieval (IR), knowledge extraction, and document classification. This paper addresses a particular case of NER, acronym expansion (or definition) when this expansion does not exist in the document using the acronym. Since acronyms may obviously expand into several distinct sets of words, this paper provides nin...
متن کاملImproving Keyphrase Extraction from Biomedical Documents Using Domain Specific Feature Set
Keyphrases enable the reader to quickly determine whether the given article is suitable for the reader’s digest. Keyphrases are also important for medical document retrieval and text mining research. Sometimes, the author-assigned Keyphrases or keywords available with the articles are too limited to represent the topical content of the articles. Many medical documents also do not come with auth...
متن کاملAssessment of approximate string matching in a biomedical text retrieval problem
Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine...
متن کاملText Mining in Biomedical Domain with Emphasis on Document Clustering
OBJECTIVES With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. METHODS This paper reviews text mining processes in detail and the software tools a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009